2023-12-01
There are 1941 days worth of data and 35 unique items in the subset of the dataset used for analysis. The data was restricted to a random sample of 10 items per department as this is meant to just be a demonstration of what is possible.
Hierarchical Forecasting
Creates a forecast for every node 1 for every level in the hierarchy
Current iteration uses minimum trace optimization based on in-sample covariance for reconciliation 2.
For the most part, it seems as though all of the forecasts are reasonable, although they seem to sharply drop off at the last date for unclear reasons. It’s possible there’s an error in the code somewhere, this will require further investigation.
These forecasts seem to be missing something. The ETS and Croston are too flat, but the best fit, STLs, and stepwise ARIMA don’t quite match up with the most recent weeks. There is room for improvement here, but they aren’t excessively unreasonable.
Overall, the MAE and RMSE tell a similar story. The STLs seem to overall be the best forecast for many of the items, and the mean forecast is also often the best forecast for some of these items.
The overall RMSE for the reconciled forecast is 2.54, which is a 52.29% improvement compared to the baseline forecast.
Let’s say historically each node is split equally between the lower nodes, e.g. half of the contracts in the United States go to the Southern Region and half go to the Northeastern Region
Forecast at the very top level, then use proportions to divide up the forecasts to lower nodes.
Let’s say we forecasted 110 for the top level, and wanted to use historical proportions where each node contributes equally to the node above.
You can just divvy up the forecast based on the expected proportions at each level, starting at the top and going down.
Forecast at the bottom level, then just add them up to get the higher levels of aggregation.
Let’s say you forecast 5 for each county
You’d then add up each node to get the number for the node above until you get to the top of the hierarchy.
Forecast at some point in the middle, then add up to get higher levels of aggregation, and use proportions to divide up the forecasts to lower nodes. For this example we’ll use historical proportions.
Let’s say you forecasted the states, 10 for Tennessee, 15 for Florida, and 30 for Connecticut.
You would divide each of those forecasts by the historical proportions to get the values for the nodes below
Then you would add to get the values of higher nodes.
Forecast at every level and then use linear algebra to minimize the errors at every level. It does this by minimizing the trace of a matrix, and there are several matrices you can use for this based on the data you have. If you want to learn more I’d suggest checking out forecasting principles and practices 3. It gives a good overview of the subject, and from there you can choose if you want to read the papers on it (unfortunately I haven’t found any good videos that explain the math behind it to recommend).
-Back tested all plausible models
-Fitted values were rounded
-Negative fitted values were changed to zero
Data: - https://github.com/Mcompetitions/M5-methods accessed December 1st
Math and major packages:
Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. OTexts.com/fpp3. Accessed on October 10, 2023.
Mitchell O’Hara-Wild, Rob Hyndman and Earo Wang (2023). fable: Forecasting Models for Tidy Time Series. R package version 0.3.3. https://CRAN.R-project.org/package=fable
Wickramasuriya, S. L., Athanasopoulos, G., & Hyndman, R. J. (2019). Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. Journal of the American Statistical Association, 114(526), 804–819. DOI
To see my code for this, check out my notebook here